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Abstract 

In this paper, we consider a data matrix X^v G M ^^ where ah 
the rows are i.i.d. samples in R^ of mean zero and covariance matrix 
E G IRP^P. Here the population matrix S is of finite rank perturbation 
of the identity matrix. This is the "spiked population model" first pro- 
posed by Johnstone in [TU]. As N,p -^ oo but N/p — )• 7 e (l,oo), for 
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"^ ■ the sample covariance matrix Sn '■= Xj^Xj^/N , we establish the joint 

^ distribution of the largest and the smallest few packs of eigenvalues. 

Inside each pack, they will behave the same as the eigenvalues drawn 
from a Gaussian matrix of the corresponding size. Among different 
^N . packs, we also calculate the covariance between the Gaussian matrices 

-Y-\ ! entries. As a corollary, if all the rows of the data matrix are Gaus- 



sian, then these packs will be asymptotically independent. Also, the 
asymptotic behavior of sample eigenvectors are obtained. Their local 
fluctuation is also Gaussian with covariance explicitly calculated. 
Key Words: Spiked population model. Asymptotic sample spectrum 

1 Introduction 

Suppose we have N independently and identically distributed samples xi, . . . ,xn G 
W. Here A^ is the sample size and p is the dimension of our data. We can then 
form the data matrix X^v = (xf , . . . , x^)^ G M^^*' and further define its sample 
covariance matrix 
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In this paper we are interested in the asymptotic joint distribution of the extreme 
few eigenvalues and corresponding eigenvectors of the matrix Sn- Below is the 
assumptions of our model. 

• All the data vectors Xi are independently and identically distributed of mean 
zero and covariance matrix S G MJ'^'p. Here S is a non-random positive 
definite matrix. 

• For each vector Xj, the fourth moment E|xy|^ < cxd for 1 < i < N,l < j < p. 

• A^, p — 7- oo but their ratio N/p := 7^ + o{N~^^'^) where 7 is a fixed amount 
in the interval (1, 00). 

• We denote ii > £2 >■■■> ip to he the eigenvalues of the matrix S. Then 
we assume that all of the £j's are equal to one except for only finite of them. 
That is, there exist fixed integers r+,r^ which are independent of N,p such 
that 

£1 > £2 > • • • > 4+ > 1 > £p-r.+l > ip-r.+2 >...>ip 

and the rest of the eigenvalues are 

• We can assume, without losing generality, that our true covariance matrix E 

is a diagonal matrix. That is, if we denote S := diag{£i, . . . , ir+,£p-r-+i, ■ ■ ■ , £p} ^ 
R''+ "'"''- = W with r := r^ + r_, then we can assume S to be of the form 

Hence we can decompose each row of our data matrix Xi to be xf = {C.f, rjf), 
where ^j G W, rji G MP~^ and Cov(^j) = S, Cov{rii) = Ip-r- Our next 
assumption is that rji has i.i.d. entries and is independent of ^. 

The model defined above is the "spiked population model" proposed in [19]. 
The unit eigenvalues represent pure noises, while the spiked eigenvalues represent 
true information. In real applications, we will encounter such models quite often. 
In mathematical imaging (see [22j), the observed spectrum of the sample covariance 
matrix indeed has some detached eigenvalues, representing the possible scatterers 
in the region. As another example, in mathematical finance (see [23]), each row 
of our data matrix represents the correlated returns of each stock. The sample 
correlation matrix has some spiked large eigenvalues, representing the main factors 
driving the market, and some small eigenvalues, representing the linear dependence 



of these factors. Other possible apphcations include, but not restricted to, speech 
recognition (see [21]), physics mixture (see [25]) and statistical learning (see |26j). 
We define (A^ , . . . , Ap ) as the eigenvalues of the sample covariance matrix 
^;v, where aS^^ > Xf^ > ...> Af^. In the rest of the paper we will refer 
ii as the true eigenvalues and A^- as the sample eigenvalues. In the null case 
where S = /, a lot of properties are known. The empirical measure of {A^^ }f=i, 
denoted by Fn ■= XliLi "^(-^l )/P^ ^i^l almost surely converge in distribution to 
the Marcenko-Pastur law (see |llj), whose density is defined by 

F(A) := ^V(A+-A)(A-A_)- 1|A_<A<A,} (1.2) 



where A+ := (1 + 7 ^)^ and A_ := (1— 7 ^Y- The support of the density, [A_, A+], 
is often called the Marcenko-Pastur sea. Regarding the largest eigenvalue Amax = 
X\ and the smallest eigenvalue X^^-^l = Xp of the sample covariance matrix Sn, 
German first proved that Amax -^ A+ almost surely in fT2] and later Silvertein 
proved that X^^^l — )• A_ almost surely in ^13j. That is to say, the largest and the 
smallest eigenvalues will converge to the corresponding edges of the Marcenko- 
Pastur sea. For a second order approximation, Johansson in [TB] proved that the 
local fluctuation of Amax, properly scaled and centered, will converge weakly to 
the Tracy-Widom law. Baker, Forrester and Pearce in [IB] also proved the similar 
result for the smallest eigenvalue. We note that most of these results are universal, 
as is proved in [15], [20] and [21], to list a few. 

For the spiked population model where E 7^ J, the phenomenon becomes much 
more interesting. Recent research found that the non-null eigenvalues tend to pull 
the extreme sample eigenvalues out of the Marcenko-Pastur sea [A_, A+], provided 
that they are larger or smaller than certain thresholds. 

In ^ Baik and Silverstein proved the almost sure limits of the extreme sample 
eigenvalues pulled out by the spikes. More precisely, for fixed j, almost surely 

^^ ^ \i,+^-%/{i,-l) if£,>l + 7-^ ^'-'^ 

:{N) ^ / A- if £p_j > 1 - 7" 

-p-j 






Note that this includes the case where some of the £j(or £p_j)'s are the same. 
In this case, the corresponding A^- (or Ap_j)'s just converge to the same limit 
specified in (II. 3p and (II. 4p . We call these eigenvalues ''packed" . 

But what is the second order approximation? Baik, Ben Arous and Peche in 
[T] observed the phase transition phenomenon of the asymptotic distribution of 
the largest sample eigenvalue Afnax = A^ in the complex Gaussian case. They 



proved that if £i > 1 + 7 ^ (i.e., when X[ is pulled out of the sea), then the local 
fluctuation of A^^ will be asymptotically the same as the largest eigenvalue of a 
kx k GUE matrix, where k is algebraic multiplicity of ii. Moreover, in ^ Bai and 
Yao obtained the joint local fluctuation of the packed sample eigenvalues — when 
suitably centered and scaled, each pack of sample eigenvalues will asymptotically 
have the same distribution as the eigenvalues of some Gaussian matrix with the 
corresponding size. 

We note that similar results can also be obtained for perturbed Wigner case, 
see [2], P and [7] for a reference. 

However, none of these results deal with joint distribution of different packs of 
sample eigenvalues in our spiked population model. Probably a naive guess is that 
different packs are asymptotically independent. However, this is not necessarily 
the case. By Bai and Yao in [4] the behavior of the local fluctuation of each pack of 
eigenvalues can be fully described by a Gaussian matrix. Thus, in order to establish 
the relationship between different eigenvalue packs, we just need to calculate the 
correlations between entries of these different Gaussian matrices. In this paper, 
we establish the correlation formula, which is not necessarily zero. Moreover, a 
sufficient condition for that to be zero is that the first four moments of Xi behave 
as if they are independent, i.e., for example, E(xijXij)^ = Ex^jX^^-. As a corollary, 
if each row in our data matrix X is Gaussian, then different packs of sample 
eigenvalues of the sample covariance matrix Sn = XjjXj^/N will be asymptotically 
independent. But this is not in general true for other distributions. We also found 
that the limiting behavior of the local fluctuation of these eigenvalues only depends 
on the flrst moments of Xj. We refer to this as the four moment principle. 

Our next interest in this paper is to derive the asymptotic behavior of the 
eigenvectors. Under the assumption that all the Xj's are Gaussian, Paul in [5] 
proved that the sample eigenvectors will also be inconsistent — the angle between 
the sample eigenvector and the true one will converge to a nonzero constant. In 
this paper, we remove the Gaussian assumption and establish the local fluctuation 
of the angle. For this universal case, we also observed the four moment principle 
— its asymptotic behavior will also depend only on the flrst four moments of Xj. 

To state our main results below, we introduce the following two notation. 

Definition 1.1 Denote Xn as a random variable. We say X„ = Op{l) (or 
bounded in probability) if for any e > 0, there exists some M^ such that 

P(|X„| > M,) < e, Vn > 1. 

We say X„ = Op(l) (or decay in probability) if for any e > we always have 

lim P(|X„| > e) = 0. 




The rest of this paper will be organized as follows. In Subsection ll.ll and ll.2l we 
will state our main theorems for the limiting behavior of the sample eigenvalues and 
eigenvectors. Their proofs will be stated in Section |2] and Section |3l respectively. 
All these proofs rely on the central limit theorem for the bilinear form, stated and 
proved in Section HI Hence here Section H] only serves as a tool and can be regarded 
as a self-referenced section. Finally, Section \5\ serves as an conclusion part of this 
paper. 

1.1 Main result for eigenvalues 

Under the assumptions of the spiked population model, we assume our true co- 
variance matrix S has the form 

E = diagjai, . . . , ai, ^2, • • • , "2, • • • , "g, ■ • ■ , "J (1-5) 

with Yl'i=i ^« ~ ''"■ Also we define g* such that 

«! > ^2 > • • • Ciq* > 1 > Ctq*+1 > • • • > CKq- 

Since we are only interested in the isolated eigenvalues lying outside of the Marcenko- 
Pastur sea [A_,A+], we can assume without losing generality that none of these 
Oj's are in the interval [1 — 7"^, 1 + 7~^]. Then by [3] there will be r isolated 
sample eigenvalues lying outside of the Marcenko-Pastur sea. Recall that the the 
sample eigenvalues are denoted by {A^^ }[=i. Since we wish to denote both the 
greatest and the smallest few eigenvalues in a convenient way, with slight abuse of 
notations we define {\^^^}'j^i by 

1 ^i-l, if E£i r. + 1 < J < r. 
Then we know that these A'^-'^'s can be grouped into q packs with 

_2 

Xi^lZl n+s) ^p^ .-a. + 1_^ for 1 < s < r,-. 

To consider the second order approximation, for any 1 < j < q define Z- G W^ 

by 



(^"^•••>^;?) (1-6) 

ZW = ViVp^^-i'^^+^^-p^^ l<s<rj. (1.7) 
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Here's the asymptotic behavior of the r-dimensional vector Z^^^ := (Z[ , . . . , Zq ^ 



■'q 



Theorem 1.1 As N ^ oo, Z^^' will weakly converge to the r dimensional vector 
Z := {Zi, . . . , Zg) partitioned in the same way as Z^^\ such that 

• For each I < j < q, Zj e W^ has the same distribution as the Tj eigenvalues 
of a Tj X Tj Gaussian symmetric matrix (1 + 'y~'^Cijm^{pa ■ ))~^ ■ G*--^^ with each 
entry G^^ being Gaussian distributed with mean zero. 

• The intra-matrix-covariance of these Gaussian entries are 

+ {6jj - Ujj) ajls=v,u=t + ajls=u,t= 
for any 1 <s<t<rj,l<u<v<rj. 

• The inter-matrix- covariances of these Gaussian entries are 

for any 1 < s < t < Tj, 1 < u < v < Tji and for any I < j ^ j' < q. 
Here ujjj',9jj/ are defined by 

Ujf = I 1 + ^-^1 ( 1 + ^ 



aj — IJ \ Uji — 1 

~ ^ K--l + 7'')(«/-l + 7"') 
(a,-l)Kv-l)-7-2 • 

and m^{paj) is defined in (2J^ 

Remark 1.1 In particular, if for any j ^ j' and any 1 < s,t < Tj, 1 < u,v < Tj/, 
^^J^Zln+sA^j^r,+u,i^j:illn+t,i^j2^r,+J = Oijaj>ls=t,u=v, (1.8) 

then these q Gaussian matrices are independent of each other. In this case, the q 
packs of eigenvalues are asymptotically independent of each other. If all the C,i 's are 
i.i.d. Gaussian, then U.8\) holds true, as for Gaussian distribution uncorrelated- 
ness implies independence. However, for general distribution of ^i 's, U.8\) no 
longer holds true. Hence in general these q packs are not independent of each 
other. 



Remark 1.2 We observe the four moment principle in Theorem \l.l[ The local 
fluctuation for the sample eigenvalues A'--'^ only depend on the first four moments 
of^i. The similar phenomena has been observed for many times in random matrix 
literature. 



1.2 Main result for eigenvectors 

In the subsection we only consider the case where all the eigenvalues of S are 
distinct. That is, all the eigenvalues of Cov(^) have multiplicity one. In this case 
q = r and 

S = diagjai, . . . , Or}- 

Again, since we are only interested in the isolated packs of eigenvalues, we can 
assume without losing generality that all the ojj's are sufficiently far away from 1, 
i.e. \ai — 1| > 7^"*". 

Recall we can decompose our sample point Xj as xj = {^f,r]f). Here we 
use the same partition for the sample eigenvectors. Indeed, we define the j-th 
eigenvector by {u^^''^^v^^''^Y'. That is, 

^^ ( t^) ) = ^^'^ ( % ) ' ^^'^ ^ ^^' ^^'' ^ ^"' 

where Sn is the sample covariance matrix. Since the eigenvectors are unique up 
to scaling, we require that ||m*-''-*||2 = 1 and Uj > 0. Here Uj is the j-th entry of 

M"-*. Also, for notational convenience, we denote m_- as the (r — l)-dimensional 
vector obtained by deleting the j-th entry of u^^\ 

Note that the j-th true eigenvector is (ej, 0^)"^, where ej = (0, . . . , 0, 1, 0, . . . , 0) G 
W is a vector of all zeros except the j-th entry being one. Hence intuitively we 
should have u^^^ — )■ Cj. This is indeed the case from the following theorem. 

Theorem 1.2 We have u^-^^ — )■ Cj in probability. Furthermore, u^ = 1 + 

Op{N^^). For the second order approximation of the uz_-'s we have the follow- 
ing result. 
Jointly, 

Pa, [Otj - Ui) 

1 + 7 ^ajm3{paj) ■' 
Here the entries {G\'' }ij are Gaussian distributed with mean zero and covariance 

Cov(G'i ,G'.f ') = ujjj> lK[^i^ii^j^j>]-aiai'li=j^i>=j' +{6jji-ujjji)[aiajli=j'j=i>+aiajli=i'j=j' 
where u}jj/,6jj/ and m^lpa ) are defined in Theorem \l.l\ 



Remark 1.3 If in particular K[^iC,^] = for some i 7^ j , then the local fluctuation 
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ofu_j and X^^^ will be asymptotically independent. This condition is satisfied if 
assume that the distribution of the sample points Xi are i.i.d. Gaussian. But this 
is not necessarily true for general distributions. 

From Theorem 11.2^ we observe that the u part will be asymptotically consis- 
tent. But what about the whole eigenvector {u'^,v^)'^7 The next theorem shows 
that the whole eigenvector will not be consistent — the angle between the sample 
eigenvector and the true eigenvector will converge to a non- vanishing constant. 

Theorem 1.3 Denote 6^^^ as the angle between the j-th sample eigenvector 
{u^^^ ,v^^^ )'^ and the j-th true eigenvector {ej,0^)^. Then for all I < j < r, 
we have that jointly 

N 3/2 ^ f ^, . 1 

l + 7-2m3(p„.)a,) -ViV cos^(^) 

where {Gj,HjiYj^^ are jointly normal with mean zero and covariance 

Cov{Gj, Gj>) = ujjj'lE^]^], - a^ay] + 2(4'' - ujjj>)a'^jlj=j^, 
CoviH,, Hy) = Ci/[Ee|eJ - a,ay] + 2(r,y - QMh=r^ 
Cov(Gj, Hy) = Kjj, [E^|^|, - ajaj>] + 2ijljj, - Kjj,)a]lj=j,. 



Here ^jj',Ojj'-Cjj'^'T~jj'^ i^jj' and fijj> are defined in /i2.26\) . \2.2'1\) , h3.3^) . /i3.36\) . 
\3.31\) and Ii3. 38\) . respectively. The functions m^^pa ),m4^{pa) are defined in 
(2J^ and (3J^. 



Remark 1.4 Here Gj is exactly the same as G in Theorem \1.^ This can 
build the correlation between the local fluctuation of the sample eigenvectors and 
the angles. 

Remark 1.5 Once again, in Theorem \1.2\ and Theorem \1.3\ we observed the four 
moment principle. The local fluctuation of the sample eigenvectors as well as the 
angles only depends on the first four moments of E,i. 



2 Asymptotic behavior for eigenvalues 

Following the notation in Section [H each sample point can be decomposed as 



xt 



(C ' Vi)- Our data matrix X^r has the form 



X 



N 



f xj \ 



\xl J 



vl\ 






(2.1) 



where ^i,rii are independent, Cov(,^j) = S and Cov(?7j) = /. We introduce the 
notation 



X, 



1 



A^ 



Xr, 



1 



vl\ 






A^ 



(2.2) 



vl ) 



Then Xj^ = \/N{X^,Xj^). Our sample covariance matrix is then 

1 



Sn — —Xj^Xn 



To calculate its eigenvalue, we calculate 



Xc Xc X/: Xfj 

X^ X^ X^ Xn 



(2.3) 



det(AJ - S, 



N) 



det 



det 



A/ - X^X^ 

-X^Xj: 



XI- 

\I - XJAn{\)X^ 



—X^ Xri 

X^Xj^ 







XI 





■ X^ Xn 



where 



An{X) = 1 + x.ixi - xi^ x,)-^x 






(2.4) 



(2.5) 



Since X^ consists of pure noise. By [IT] the eigenvalues of Xj^X^ will converge 
to the Marcenko-Pastur law, with support [A_, A+]. However, as X — )■ oo, if £j > 
l + 7~^ by [3] we have A'^-^'* — )■ £j + 7~^^j/(^j — 1) which does not lie in the Marcenko- 
Pastur sea [A_, A+]. The similar statement is true for the smallest few eigenvalues. 
Hence we know that almost surely det(A"''/ — XjX^) ^ 0. This implies that, 

without losing generality, we can safely assume that det(A*^-'-'/ — X|^AAr(A*^-'-*)X^) = 
0. Hence from now on we can restrict our attention on the equation 



det(AJ - XTAn{\)Xi.) = 0. 



(2.6) 



As the first observation, from (12.51) . the matrix An{X) is a function of A 
and Xrj only. If A is non-random, then An{X) is independent of Xg. Moreover, 



XTAn{X)X^ is a bilinear form of the ^j's. In Section HI we will derive a central 
limit theorem for such bilinear forms, which will be the core of the whole proof. 
Indeed, we can prove in subsection 12.21 and 12.31 that, for fixed A, XJAn^XJX^ will 
converge to a diagonal matrix with Gaussian local fluctuations. Based on this, we 
can get the first and second order approximation of A*^-'-'. 
Now let 

Covfe) = diag{£i,£2,...,4} 

= dicig{ai,...,ai,a2,...,a2,...,ag,...,ag} (2.7) 

ri 7-2 Tq 

Here ri + . . . + r^ = r. Also for convenience we define the index set 

Ij := {^ G {1, . . . , r} : £, = a,}, j = 1, . . . , g. (2.8) 

We are interested in the eigenvalue in the j-th pack, namely, ;\('"i+-+»'j-i+«) for 
1 < z < Tj. If |aj — 1| > 7~^, then we know almost surely as A^ — )■ oo 

The local fluctuation is of order N~^^'^. Hence in our main equation (12. 6p we set 

A = p„^. + -%. (2.10) 



where Xj = 0(1) is a non-random number independent of A^. 
This time 

\I - XjAr,{\)X^ = (\I - ltr(A^(A))s') - (xfA^{\)X^ - ^tT{A^{\))E\ 

(2T1) 
can be written as the sum of two parts. For the first part, XI — -^tr(A7v(A))S is 
a diagonal matrix. For the second part, we shall use Theorem 14.21 to prove that 
y/N ■ {XJ'An{X)X^ — -^tr(y4Ar(A))E) is approximately Gaussian. But first, we need 
to derive some properties of An{X) to ensure that all the conditions in Theorem 
14.21 are satisfied. 

2.1 Properties of An{X) 

In this subsection, we derive some lemmas of our matrix An{X), which will be 
served as some preparation steps for the main theorem. 
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First we introduce some constants. Let [A_, A+] be the Marcenko-Pastur sea. 
for 1 < j < q define 



miipa,) 



m2ipa,,Pa^,) 



X 



X_ Pa, — X 



-F{x)dx 



X 



aj-1 



-F{x)dx 



(2.12) 



ms^po 



X_ {pa, - X){pa^, - XY 

{aj — !)(«/ — 1) + j~'^ajaf — 7"^ 
{aj - l){af - l){{aj - l){af - 1) - 7"^) 

A+ 



(2.13) 



X 



A„ (pa, - X) 



;F(x)dx 



-( TV2 ^- (2.14) 

("i - 1) - 7 



Here F{x) is tlie density of the Marcenko-Pastur law. 

Lemma 2.1 Let A]\f{X) = {a].^ {X)}^^^-^^, where A = pa, +Xj/yN for some fixed 
Xj. We have 



N 



^E<.<f(A)<.<f(A')4(H -'-'fl + "'(''-)l 



N 



1+ 



7 2[l + mi(p„ )] 



pa, - 7 ^[l + rrii {pa, )]J \ pa^, - 7 2 [1 + mi {pa^, )] 

(2.15) 



Proof for Lemma \2. 1[ The proof will be similar to Lemma 6.1 in |3]. By that 
lemma we have for each 1 < s < A^, 



aif(A)Al+ ^"'^+"^^^""!^^. :=a. 



(2.16) 



Pa, -7"^[l+mi(p„J] ■ ^^ 

where Cj serves as a shorthand notation. Also we have for each A, by Lemma 6.1 
mi, 

supEfa!f^(A)) <oo. 
Hence for A and A' and any M > we have 

an (A)«n (^')|l|4^)(A)a(f)(A')|>Af 



E 



< 



2M 



supEfaSf^(A)) +supEfaif (A 
iv V / jv V 



.(^)f\\J^)f\'\ ;c 



Hence a^i (A)a^]^ (A') is uniformly integrable. Hence 

N 



E 



^E«-HA)aif(A')-C,C,, 



< E 



ar;^(A)air(A')-C,C7,- 



^0. 



D 
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Lemma 2.2 For the same setting in Lemma \2. 1\ we have 

1 ^ 
^ X] 4f^(^)«ifHA') ^ l+7"^"^i(Pa,)+7"^"^i(Pay)+7"^"^2(ai,Pay)- (2.17) 

s,t=l 

Proof for LemmalKE We have An{X) = I+Bn{X) for 5jv(A) := Xr,{XI-X^X^)-^X^ . 
By Lemma 6.1 in [4J we have 

ltr(i?^(A)) A 7-VK.), 
^tr(i?^(A)i?Ar(A)) A 7"'rn2(p.,,p.^,,). 
Hence 

N 



^5^aif(A)a(:)(A') = ltr(A^(A)A^(A')) 



s,t=l 



1 + ^tr(i?^(A)) + ^tr(i?^(A')) + ^tr(i?^(A')i?^(A)) 



A 1 + 7 ^"^1 (pa, ) + 7 ^"^1 (Pay ) + 7 ^"^2 (Pa, , Pay ) • 



D 



Lemma 2.3 For the same setting in Lemma \2.1\ there exists some constants 
M > and c > such that 

P( mix |4f^(A)| > m\ < exp(-cA^), for all N >l. (2.18) 

Proof for Lemma \2. 31 Following the notation in Lemma [2. 2 1 we still set Am{X) = 
I + Bn{X). By the singular value decomposition of B^ we know 



|B^(A)||2Amaxf 



A2 A^ 



V(A+-p„J2'(^__^^J2 

Also, by the limiting distribution of extreme eigenvalues of we have, for any 

M > max(A^/(A+ — Pa )^, A^/(A_ — PaY), there exists some constant c > such 

that 

P(||5^(A)||2>M) <exp(-cAr). 

For any s, t G {1,2,..., N}, denote e^ = (0, . . . , 1, . . . , 0)^ to be the column vector 
with all zero entries except a single one in the s-th entry. Hence 

kif (A)| < \{BN{X))st\ + 1 < \\esBr,{X)eth + 1 < \\BN{X)h + 1- 

12 



This gives 

P( inax I aif^( A) I > M + 1 j < P(||5jv(A)||2 > M) <exp{-cN). 

n 

Lemma ITT] — Lemma 1^751 imply that for \ = p^- +Xj/yN with fixed Xj, An{\) 
satisfies all the assumptions in Theorem 14.21 Equipped with these lemmas, now we 
can realize our promise — to establish a central limit theorem for XTAn{X)X^ as 

well as the local fiuctuations of A^-'-'. This is to be done in the next two subsections. 

2.2 Formula for the j-th pack of the sample eigenvalues 

Recall in (12. lip we decomposed A/ — Xj An{X)X^ into two parts. By Corollary 

14. H the second part Xj A]^{\)X£^ — ti{A]^{\))T,/N — )■ in probability. Let's find 

the limit for the first part A/ — tr(Ajv(A))S. This is a diagonal matrix and for 
s G {1, 2 . . . , r}, if s G /j for some i 7^ j we have 

A/ - tr(A^(A))s) = p,^. - (1 + 7"'mi(p,J)a, + N'^'^ 

/ ss 

Here I used the equality pa = (1 + mi{pa)cij))aj for any j = 1,2, ... ,g. For 
s & Ij, we have 

XI - tr(A^(A))E) = p„^ + ^ - (1 + 7-2mi(p„^. + x,/VN))aj + Op{N-^'^) 



^ (1 + 7-2a^m3(paj)x, + Op(iV-i/2). (2.19) 



In summary, by Corollarv l4.1l for any < k < 1/2 we have 

(pa, - pa, f^)l.=t + OpiN-") if se li for some i ^ j 



(2.20) 
Hence the matrix A J — XTA]\f{X)X^ will converge to a diagonal matrix with 

the block Ij x Ij being all zeros. This is quite intuitive as pj is the limit for the 

j'-th pack of the sample eigenvalues. 

By analyzing the limit of XI — XTAn{X)X^, we can only obtain the first 

order approximation of the j-th pack of the sample eigenvalues. In order to get 

the second order approximation of the j-th pack, we need to obtain the second 
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G^'^) ={ ^^/,?";rr?/ts r/^? (2-21) 



order approximation of the matrix \I — XTAn{X)X^. Thus we define the matrix 
GO) e W'' such that 

(^XI -XfAN{X)Xi:^ iis^Ij 

^t I ^(\I - X^An{X)X^ if s e Ij 

That is, we define G'^^'^ by muhiplying the rows Ij of A/ — X|'y4Ar(A)Xg by viV, 
leaving the rest of the rows unchanged. 

Since det(G'(^)) = det(AJ - XJAn{X)X^) ■ A^~^^/^ in order to get the hmiting 
behavior of the j-th pack of sample eigenvalues, we can turn to analyze the roots 
of the equation det(G''^-'^) = 0. We know that its rows indexed by {1, ... , r}\Ij are 
asymptotically diagonal. For the rows indexed by /j, they will be dense. By our 
central limit theorem in Section HI they will be of order Op{l). Hence intuitively 
regarding the determinant of G^^^ we have the following lemma. 



Lemma 2.4 If X = pa + Xj/yN for Xj fixed, we have 

detG(^) = det ([G(^)],^.x/,) -fllpa, - Pa,^) \o,{l). (2.22) 






Here recall Vi = |Jj| and [G*--'^]/ x/ is the sub-matrix of G^^^ with rows and columns 
indexed by Ij. 

Proof for Lemma \2.4\ By expanding the determinant we have 

r 

det G(^) = N'^'^ ■ V sgn((T) TT (xi - Xf A;v(A)X^) 

^-^ -'--'- V / s,cr(s) 

(T S = l 

As in [8j we just need to prove, for any permutation a such that there exists some 
So ^ Ij, o"(so) 7^ So, we always have 



m'^ J] [XI - X'^An{X)X^ a 0. (2.23) 

s=l 

Indeed, for any < k < 1/2, if s G Ij we must have, by (I2.2UJ) . 

(XI - Xj Ai,{X)X^ =o,{N-^). 

\ / s,a(s) 

If there further exists some sq ^ Ij such that sq 7^ cr(so), then by (I2.20p we have 

(xI-XJA^iX)X^ =o,(iV-). 

V / so,o-(so) 



r{so) 
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Hence 



r 

V / s,(t(s) 

Thus we just need to choose k > rj/[2(rj + 1)] to prove fl2.23p . This completes the 
proof of the lemma. D 

By Lemma \2A\ in order to get the asymptotic behavior of the j-th pack of 
the eigenvalues we just need to consider the Tj roots of the equation 



det(^[G^^^],^.x/,j=0. 
By (I2.19P we have for fixed Xj, 

G^^.i, = (l + l-^a,m,{p^^))x,I - i?(^) (2.24) 



where 



b!^'^ = ^(x^AN{X)X^-^tT{AN{\))^ 



ijxij 



N(^iX^[:J,]fA^iX)X^[:,I,] - ltr(A^(A))a,/j . (2.25) 

Here X^[:, Ij] represents the sub matrix of X^ consisting only the columns indexed 
by /,. 

Later we will get a central limit theorem for R^^^ for fixed Xj. That is, we 
will prove that R^^^ will converge weakly to a Gaussian matrix. Hence intuitively 
speaking the limiting distribution of the j-th pack of the eigenvalues will be the 
same as the Tj eigenvalues of certain Gaussian matrix R^^^il + 'y~'^aj'm^{pa)). 
More rigorous proofs will be provided in the next subsection. 

Further, in order to get the joint distribution of these q packs of eigenvalues, 
it suffices to establish the joint distribution of these q Gaussian matrices R^^'^ for 
i = l,...,g. Since they are Gaussian, we just need to characterize the covari- 
ance between different entries among these G^^^^s. This will be done in the next 
subsection. 

2.3 Central limit theorem for {G^^^'^-^ and finishing the 
proof 

In this subsection we apply Theorem 14.21 for {R^^^Y-^. Using the notation of 
Theorem 14. 2^ we have K = Yl'i=i '"«('"« + l)/2- For i = 1,2, . . . ,K, we have 

A,(e) = i+xJL,+jC\i~x^xyxl vg !i<^+i <i<± '*^ 
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By Lemma 12.11 — 12.31 we know that the An {£) satisfy the assumptions of Theorem 
14. 2[ with uee' and 9u' defined by below. For any £, i' such that 



riUi + 1] 



we have 



(^U' 



i<i<J2- 



ri{ri + 1) 



y^ Tiirj + 1) ^ ^ ^ ^, ^ y^ n(r. + 1) 



j=i 



i=l 



i=l 



1 + 



Pa, -7"^[l + "^l(PaJ] 



1 + 



7 ^[l+mi(pa^.,)] 

Pa. -7"^[l + "^l(Pa,,)] 



7" 



1 + 



7' 



"i' - 1 



^ii' 



Gw 



1 + 7~2mi {p^. ) + 7-2mi (p„^., ) + 7"^m2 (p„^. , p„^., ) 
(a,-l+7"')(«,'-l + 7-2) 



•- ^ii' 



(2.26) 



(2.27) 



(a, -!)(«,, -1)- 7-2 

For notational convenience, we denote the right hand side of (12.261) and (I2.27P to 
be ujjj' and Ojji, respectively. 



Also we have 



•ME 



ri{ri + 1) 



i=l 



■rj(rj + i)/2 



\ ^jyiZl n+i' • • • ' ^E^Ii n+i' ^E^-i n+2' • • • ' ^Eti 



r-i+2' 







^il:^^^+ 



i=l 



rj{rj+l)/2 



I^ES'-.+i' 



-%;in+r,'^E£in+2' 



; ^Y-J-l 



Ei=i ^»+''j 




Then by using our Theorem 14. 2[ for every fixed Xj, our q matrices {R'^^'>Yj=i 
will converge to q Gaussian matrices denoted by {G^^^Yj=ii "with intra-matrix- 



covariance 



Cov(Gi?,G(f2 



u 



n 



^feS n+s,i^Y:iZl n+u,i^Y:iZl n+t,i^Y:iZl n+v,i] "j ^s=t,u=v 



+ {9jj - Ujj, 



(-^j ^s=v,u=t ' Oij ^s=u,t=v 



for any 1 < s < t < r,-, 1 < n < f < r,-. For the inter- matrix-covariances, we have 



cov(gh\gh: 



^jj' 



m 



j:iZln+s,i^j^r,+u,^j:Un+t,i^Yt^r,+v,i\ - «i«j'ls=t,-=-_ 
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for any 1 < s,t < Tj, 1 < u,v < Tji and for any 1 < j 7^ j' < q. 

We just proved that for fixed {xj}J^]^, {R^^'>Yj=i will jointly converge to Gaus- 
sian matrices {(?'-■'•' }^^^ weakly. Recall that our matrix R'^^'^ = R^^\xj) can be 
regarded as a stochastic process on Xj G M. Our next lemma study the conver- 
gence of {R^^^'j^^ as a process. 

Lemma 2.5 The stochastic process {R^^\xj)}'j^i defined on {xjY^^i E W con- 
verge to {G^^^Yj^-^ weakly in the sense finite dimensional distribution. 

Proof of Lemma \2.5[ We just need to prove that the finite dimensional distribution 
of the process {G^^\xj)}'j^-^ will converge weakly to that of the process {G^^^Yj^^ 
(this is a process constant in Xj). That is, we need to prove that for any positive 
integer k and any {xjiYj=i, ■ ■ ■ , {xjkYj=i ^ ^'^y the distribution of 

R^'\x^^),...,R^'\x,k),R^'\x2,),...,R^'\x2k),...,R^'\x,,),...,R^'^\x,k) 
will converge weakly to the distribution of 

gw,...,g«,g'(2),...,g(2),...,g('?),...,g'('?). 

The proof is very similar to what we have done before — just to use Theorem 14.21 
for all these qk matrices {R^^\xji)}i<j<g^i<i<k- CH 

Now putting all the parts together, we can finally finish the proof of Theorem 

Proof of Theorem \l.l\ Now recall that {A'-^^}^^^ are our r extreme sample eigen- 
values. Denote 

Now these xis are random, being no longer fixed. Now let 

%-,l < ZjA < yj,2 < Zj,2 < ... < yj,r, < Zj^r^, l<j<q 

be 2 Yl'i=i ^i — 2^" fixed constants. For notational convenience, define 

(Pj,i = Paj + -7=, Vj,i = Pa, + -7=, ,i=l,...,rj, , J = 1, . . . , g. 
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Then 

P 

= P 

-^ P 



Vj/ < ^r^-i r,+e < ^j/^ V£ = 1, . . . , rj, Vj = 1, . . . , g 
det (0,-, - XfANi(j)j,e)X^^ det (v^,-, - Xf A,v(V^,/)X^) < 0, 

W = l,...,r„ \fj = l,...,q 

W'^ 1 + 7"^«i"^3 (Pa,)^ ^ ^'^ l + 7-2ajm3(p„J 

V£=l,...,rj, Vj = l,...,g 



<0, 



This finishes the proof as the last expression is exactly the probability that the Tj 
eigenvalues of (1 + 'y~'^aj'm^{aj)) ■ G^^^ are between yj^i and Zj^i, respectively for 
i = 1, . . . ,rj and j = 1, . . . ,q. D 



3 Asymptotic result for eigenvectors 

As we have said in Section [H in this section we only consider the case where all 
the true eigenvalues are district. That is, g = r and S = diagjai, . . . , a^}. 

We denote [u^, v^y as the eigenvector of the sample covariance matrix, where 
u E W ,v E RP"''. Then we have 



A/ - XjX^ - 
—X X^ XI 




{3.1] 



Here A is the corresponding sample eigenvalue. Just the same as in Section [2], since 
we are only interested in the isolated eigenvalues, we can assume that XI — XjX^ 
is non-singular. Then from (13. ip we can get 



Xu = XfANiX)Xi:U 
V = (XI~X^Xr,)-^X^X^u. 



(3.2) 
(3.3) 



In the following few subsections, we will analyze the the behavior of the eigen- 
vector of the j-th eigenvalue, i.e., when A ~ paj- We denote such eigenvectors 
by {u^^^ ,v^^^ )'^ and the corresponding eigenvalue by A*--^-*. Since the eigenvectors 
are unique up to scaling, we require that ||ti*'-'-*||2 



r(i) 



1 and Uj > 0. Here u^ 

W 



(i) 



is the j-th entry of u^^\ Also, for notational convenience, we denote u_j as the 
(r — l)-dimensional vector obtained by deleting the j-th entry of u^^\ 
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Recall that in Section [2] we proved that VN ■ {Xj An{X)X^ - tr(Ajv(A))/A^) 
will converge weakly to a Gaussian matrix, for X = Pa + Xj/\/N with fixed Xj. In 
this section, however, we have to deal with An{\^^^) where Xj is a random variable 
being bounded in probability. Thus we need to have a more generalized result, 
stated in following lemma. 

Lemma 3.1 For Aj^{\^^^) defined above we still have 

N- (xJAm(X^^^)X^ - ltr(A^(A(^))) V G(^). 



Here G^^^ is defined in Theorem \l.l[ 
Proof of Lemma \3.1\ We have 

Xf A;v(A(^'))X5 - ltr(A;v(A(^'))) = Xf A;v(p.,)X5 - ^tr(A^(p,J)S + Ri, 

Here the first term is asymptotically G^^^ /yN by the previous proof. Hence we just 
need to show that the residual term R^ = op{N^^/'^). Now i?7v = XJDn(\^^^)X^ — 
^tr(Div(A(^'^))S where 

Dm(X^'^) = An(X^'^) - A^iPa,) 

(^I + X,(A(^)/ - X^X,)-\p^J - X^X,)~'X^^ 

(^I + X^{p^J-X^X,)~^xf^ 



Xj 



N 



Xj 



N 

part I 
•2 



^ • X,(A(^)/ - X'^X,)-\p^J - X^X^yxl 



part II 



For the first part, using CoroUarv 14.11 we can show 

^ ■ ^{l + ^.(PaJ - XlX.yV^^X, - ltr(D^(A(^)))S = o,{N-'l\ 

For the second part, the norm ||X^(A(^)/-X^X^)-i(p„ J-X^X^)-2x^||2 = Op(l) 
is bounded in probability. Hence the second part is of order Op{N~^\ Putting 
these things together, we proved that i?jv(A''-'^) = Op(X~^/^), which completes the 
proof of the lemma. D 
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3.1 Proof of Theorem [172 



Intuitively, m*^-'^ should be close to Cj. Here Cj is the vector of all zeros except 
the j-th. entry being one. Hence u ~ 1 and m ■ ~ 0. The following lemma 
establishes out intuition. 

Lemma 3.2 We have 

v^^ = 0,{N-') (3.4) 

proof of Lemma \3.2[ As is shown in Section [21 we have 

1 / . ,-^..^A- 1 

Here 



XfA^(\(^y)X^ = ^tr(A^(A(^)))s + -j=R^^. (3.6) 



R^^^ := yiv|xfA^(A(^))X5 - ltr(A^(A(^)))s| (3.7) 

will, by Lemma Em Rm will converge in distribution to a Gaussian matrix. Hence 



,(i) 



Rff = Op{l). Moreover by the previous section we also have 



A(^) = Pa, + ^x, (3.8) 



for some Xj = Op{l). Substituting (13. 6p and (13. 8 p in (13. 2 p gives 



ltr(A;v(A(^')))s«(^) + -^^!^^^^'^- 






Since 

^tr{A^(X^^^)) = ^ - -^^-'ms{pa,)xj + Op{N-'/') 

we obtain 



,u. , '^u^'^ = ^Em(^') - ^7~2m3(p«,)x,SM(^^) + -^R^i'^u^^^ + Op{N-'/^). 



pa,U^^^+ ^ 



N aj yjN ' VN 

(3.9) 
In (13. 9p . compare all the entries except the j-th, we get 

p„^, fl - !£_,,_, V(^] = Op(iV-V2). (3^iQ) 

Here S_j _j is the (j — 1) x [j — 1) sub-matrix of E after deleting its j-th row 
and j-th column. All the rest of the terms in (13. 9p can be written as Op{N~^^'^) 
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because m"-* = Op(l) as it has unit norm. In fl3.10p since the matrix on the left 
hand side is non-singular we must have u_j = Op^N^^^"^), proving our first claim 

For the second claim, recall ||m*--'''||2 = l, we have 



Noting that Uj is positive, we proved (13. 5p . D 

By Lemma [3. 2 1 we can write u^^^ as 

m(^) = e,- + -^6u^^^ (3.11) 

where 6Uj = Op{l) and 5u_j = Op{l). Substituting (13. lip in (13. 9p we can obtain 
(p^^ - ^^y^'^'^ = -(x,I + j-^ms{po.,)x,^\,+R^^''e,+Op{l). (3.12) 
If we consider all the entries of (I3.12p except the j-th one, we can obtain 

(p^^ - ^S^..^,") (5^5 = R^^e, + Opil) (3.13) 

For every j, we can get an equation of Su_j as in (13.130 . Now, using Theorem 
I4.2l as well as the technique in Lemma IXTl we know that as A^ — )■ oo, our r matrices 
Rj^ , . . . , R}^ will jointly converge to r matrices denoted by G'^^^ . . . , G'^'^\ with 
jointly Gaussian entries of mean zero. Their covariance is 

Cov(G';t ,G'^{^) =ujjj' ^^s^u^t^-asauis=t,u=v +{Ojj'-ujjj')[asatls=v,u=t+aasat'^s=u,t=v] 

where Ujj> and 9jjr are defined in (I2.26P and (I2.27P . Note that the j-th eigenvalue 
is just 



N 1 + -f-'^ajUisipa,) 
Together with the expression of 

u^] = N-'/' . ( /-_ G'A +o,(iV-V2), (3.14) 

we complete the proof of Theorem. 
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3.2 Proof of Theorem [TS 



In this subsection we analyze the angle between the sample eigenvector {u^^\v^^^)'^ 
and the true eigenvector {ej , 0^)^. Here we define (3^^'' G [0, n] by 



cos/3(^) = ,^ ''^,^,,,„ = cos angle ( ( ^ir' ) , ( ""^ ]]. (3.15) 



Jmp--M("^^>)ro 



First, for notational convenience, let's define some functions, just as in the 
previous section. We define 

/ N /"^+ 2x , , , 2(a,- - 1)3 ^ ^ 

f^+ 1 (a- -1)2 
m5{pa,) = / 7 r^F{x)dx = — — ^^ — 3^. (3.17) 

A_ [pa, - X)^ [Oj -1+7 2)2((aj. - 1)2 - 7 2) 

^eiPa,) = / F(x)(ix = — -. (3.18) 

Ja_ pa, -X aj - 1 + 7 ^ 

mj{paPa) = / — -F{x)dx (3.19) 

A_ [pa, - xYipa^, - Xy 

{a, - If {ay - If {{a, - l){a,, - 1) + 7~'(c^.-c^,' + % + «.' - 2) + 7"") 
(K- - 1)^ - 7-^)((«,' - 1)^ - 7-^)(K- - !)(«.' - 1) - 7-^)^ 

m3(p.„p.,) = l_\p^^ _ ^)]p^^^ ^ ^y^(^)d^ (3-20) 

(Qj - l)(«i' - 1)^ + 7"^(aj' - l)(ai«j' + ftj - 2) - 7"^ 

((«,-l)(«,,-l)-7-2)2((«^.,_l)2_^-2) 

where F(x) is the density for the Marcenko-Pastur law. 
By (13. 3p we have 

||^;0■)||2 = (^u^j)y . xfCN(X^^^)X^ ■ (n^-^)) (3.21) 

where 

Cn(\^^^) = X,(A(^)/ - X^X,)-'XI 

This time, we have 

XfC^(A(^))X^ = ltr(c^(A(^)))s + -j= ■ Q^^^. (3.22) 



where 



Qg^ := ViV ■ (xJCn(X^^^)X^ - ^tr(C;v(A(^'^))s') (3.23) 
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which, by applying Theorem 14.21 and a similar technique in Lemma [3 -H will con- 
verge to a real Gaussian matrix. Furthermore, we have 



-tr(c^(A(^))) = 7-'m3(p„J - ^^4(p.,) + o,{N-''') 
Using the notation of the previous subsection, 

u^^^ = e. + ^ ■ D^'^B^e, + o,(iV-V2) (3^24) 

where R% is defined in (13. 7p and 
/^(^■)=diag|^ ,,..., ^ ,,0 



pa^ {oj - ai) ' ■ ■ ■ ' pa^ {aj - aj_i) ' ' p^^ {oj - a^+i) ' " ' ' Po, ("i - "r) 
Substituting (l3:22|) and (13:2^1) in (jMID we obtain 

1 1 

Il^^^''ll2 = l'^ms{pa,)aj-^=l''^m^{pa,)ajXj + ^=ejQ%^ej + Op{N^^/'^). (3.25) 

vN y N 



Using this formula in (I3.15P we get 



cos/3(^') = ^ 



^l + 7-2m3(p„Jaj 



+ ^= ■ y— ^ ^— - (7-27714 (p„Ja,% - e^Q%^e^ + Op(iV-V2) ^g^g) 



In order to get the convergence in distribution of ejQ^ Cj, and its relationship 
on Xj, we can use Theorem 14.21 Before that, we need to derive some properties of 
the matrix Cat (A). 

3.3 Properties of Cn{X) and finishing the proof 

Let Cn{X) := {ci^\xm,t=i then 



Lemma 3.3 For X = pa^ + Xj/yN and X' = Pa , + Xj//vN with Xj,Xj' being 
fixed constants, we have 

^f:4^'(^)<'i"'(^')^n .'''min'"'"'! nn,r (3-27) 

p^ Z-^ ss \ J ss \ (1 -7-2^g(A))2(l -7-2m6(A'))2 



23 



Proof of Lemma \3.3[ Define X^,-s to be the {N — 1) x (p — r) sub matrix of X^ 
after deleting the s-th row. Then by the Sherman-Morrison formula we have 

- (A X,,_,X,_.) + l-^^nA-X,%X,,_.)-i.^. 

Taking the square and pre(reps., post) multiplying ?7j(reps., rjs) gives 
lr7j(A/-XjX,)-2^, = lryJ(A-Xj„,X,,_,)~2r/, 



. (^s^(A - Xl_,X,^^,)~^r,) (r/J(A - Xj_,X,,_,) 



1 I ^T 

iV3 



(l-ir^J(A-X,%X,,_,)-ir^, 
(^J^(A - X^^-s^.,-.)-'^.) (^s^(A - Xl_,X, 



2 /^T 

iV2 



-s"^'-»?.-s7 



l-^7^J(A-X,%X,,_,)-ir^, 
Using the same proof as that of Lemma 6.1 in [3J we can prove that 



4^HA) = -r^i(A/-X,^X,)-r^.A- 



7 ^"^5(A) 



7-2m6(A))2' 



(3.28) 



Similar to Lemma [2.11 we can prove that c\^ {X)c\-y^ (A') is uniformly integrable in 
X. Hence (13:271) follows. D 



Lemma 3.4 For \ = Pa + Xj/yN and A' = Pa, + Xji/yN with Xj,Xj' being 
fixed constants, we have 



N 

s,t=l 

Proof of Lemma \3.4\ We have 

N 



1 ^ 

^E^i"^(^)^i"^(^')A7-V(A,A'). 



(3.29) 



^E^i^^(^)^i^^(^') = ^trc^(A)^i;(A 



s,t=l 



X 

1 
— tr 

X 



2 x^T 



X^(A/ — X^X^) X^ X^(A / — X^ X^) X^ 



A 7"'^7(A,A'). 



D 
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For completeness, we list the third lemma below. This is very similar to 
Lemma 1231 and the proof is almost the same. Hence we omit that. 



Lemma 3.5 For X = Pa + Xj/yN with Xj being fixed constants, there exists 
some constants M > and c > such that 

P( max|cif^(A)| > M ) < exp(-cA^). (3.30) 

\s,t=l J 

Also, regarding the interaction term between the matrix Ajv(A) and the Cat (A'), 
we have the following two lemmas. Again, due to the fact that the proof is almost 
the same, we omit the proof here. 

Lemma 3.6 For A = p^ .+ Xj/y/N and A' = Pa , + Xj'/y/N with Xj,Xj' being 
fixed constants, we have 



lVa(^)a)c(^)fp ) A [l+ 7-^[l + mi(A)] 



7 "^m^ipa^,) 

(1 -7"^"^6(Pay))^' 



,t=l 



Equipped with Lemma 13.31 — 13.51 and we use the same technique in Lemma 
13. H we can now get a central limit theorem for XJCn{X^-'^)X^. 



Lemma 3.7 The entries {eJ-RJ;^ e^, ejQ];^ Cj}^^]^ will converge in distribution to 
{G^Ih^^}]^,, where 

CmiG^l G[f)) = ^,y [Eefej - «,«,'] + 2(%' - ^.y)«|l.=.', (3-31) 

Cov(/7Jf,/7J5) = 0.'[Ee|4-«.M+2(^,'-0.')«|l.=.', (3.32) 

Cov(Gg\ H^fJ) = Kjj, [E^leJ - ttjtti'] + 2(pj/ - Kjj:)a]lj=j,. (3.33) 

(3.34) 

Here Cjj'i^jj'i'^jj' ^''^d Jijji are defined such that for all 1 < j, j' < r, 

~ ^ 7"'^5(Pa,)m5(p»^,) 

^"' ■ (l-7-^rn6(p„,))2(l-7-2m6(p„^,))2 ^- ^ 

Tjj, := 7-2m7(p„^,p„^,) (3.36) 

~ _ A , 7-'[l+^i(Pa,)] A 7"'^5(Pa,J 

V Pa, - 7-^[l + mi(p.,)] j (1 - 7-^m6(p.,))2 ^'-'^^ 

Pjy = 7"^"^3 (Pay ) +7"^"^8 (Pa,, Pay)- (3.38) 
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Noting that jointly in distribution, we have 



a;(^-) 4 



1 



-G 



(i) 



Recall the expression for cos/3*^-'^ in fl3.26p . the proof is complete. 

4 Proof of Central Limit Theorem 

In this section we prove a central limit theorem for the bilinear form. This is a 
separate result and can be used as a tool in the rest of the paper. 

Theorem 4.1 Let A^(^) = (aJ^H^)),^ = 1, ■ ■ • ,i^ he K sequences of N x N 
Hermitian matrices such that all the entries are hounded and the following limits 
exist in prohahility. 



<^u- 



u=l 

n,i)=l 

Jim ^ $: ai:n^)ai:H^') 



TW 



N^oo N 



u,v=l 



Also assume there exist some constant M > and c > such that 



N 



max \aiy{i)\ > M) < exp{-cN). 



u,v=l 



(4.1) 
(4.2) 
(4.3) 



(4.4) 



Let {xi,yi)i<i<j\f he a sequence of complex-valued i.i.d. vectors in C^^ , inde- 
pendent of {AN{i)}f^i- Here Xi,yi G C''^ and 



Xi 



Let 



( Xii \ 


,yi = 


( yu\ 


•) 


Xii) = 


^ xa \ 


\ XKi ) 


\yK^ I 




\ XiN J 






i 


0{i)- 


= E[xnyei 





,Y{i) 




(4.5) 
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and define 



Zn{^ 



X(£)M^r(£)-p(£)tr(A^(£)) 



Then as N -^ oo, Zn will weakly converge to a Gaussian distributed random vector 
W G C'^ with moment generating function 



c^W 



exp I -c^Bc ) , c G C 



Ee' 
where B is defined by B = Bi + B2 + B3 where 

Bi = I Exaxeiynyi'i - p{i)p{£' 
B2 = I E XnVi"} E 
^3 = (E 



Xi'iyei 



<^U' 



U' — ^U' 



xnXi'i 



E 



yeiVe'i 



TW — ^W , 



K 
K 
K 



ll'=l 



(4.6) 
(4.7) 
(4.8) 



Proof of Theorem 4J_ First we state that we can assume, without losing general- 
ity, that Ajsi {£) is a series of non-random matrices such that 



Limit (14. ip — (14. 3 p holds with convergence in probability replaced by ordi- 
nary convergence. 



.A' 



(TV) 



maXy ^_-i^ \a,uv 



< M for some constant M , uniformly for all A^. 



Indeed, if Ajv(£) is random, we know that for any subsequence of A^li) there exists 
a sub-sub-sequence of A^^i), with (14. ip — (14. 3 p holds true in ordinary convergence. 
Also using Borel-Cantelli lemma, we pick the sub-sub-sequence such that all the 
elements are bounded. We can turn to work on the corresponding sub-sub-sequence 
{ZN{i)}f=i- By conditioning on X^, we can treat A^i^) as deterministic matrices. 
Hence, if we can prove the theorem for deterministic matrices A^ii), then for 
any subsequence of Z]\f{i), there exists a sub-sub-sequence of Zm{^) which always 
converges to the same limit, independent of the sub-sequence chosen. By using this 
sub-sequence argument, we proved that the theorem also holds true for random 
sequences Aiy{i). 

Thus, we can safely assume that A^^i) is deterministic with bounded ele- 
ments. The rest of the proof will be very similar to that of Theorem 7.1 in [4J. 
Since it is a generalization of that theorem, here we just point out the major 
differences between the two. Also, 
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Using truncation as in [^ , we can assume without losing generality that there 
exists a sequence e^ 10 that 

\\x^\\2y\\yi\\2<eN■N'/\ \fl<z<N. (4.9) 

Just as in [1], we turn to establish the one dimensional central limit theorem 
for the random variable 

K 

Y,ceXiirAj,ii)Y{i). 

Define 

1 ^ 1 

^^ = ^^ E ^^ [X(£)*A^(£)F(£) - piiMAM)] = ^Tf E ^- (4.10) 

Here e = (m, t;) G {1, 2, . . . , A^}^ and 

J, _ J Eii ceaiV (i) [xeuVeu - p(^)] e = {u,u), . . 

For any fixed /c > 1 we have 

ivn^ = E ^'=- • • ^^^ = E n ^-= E ^G- (4.12) 

ei,...,efe G eGG G 

Here the directed graph G = G(y, E) is defined by the vertex set V = {1,2, . . . , N} 
and the edge set E such that {u -^ v) & E ii and only if e = (m, f ) appeared in 
the summation. 

We shall use the method of moments to prove the theorem. That is, we will 
analyze the contribution of all E-i/'c's- 

For each graph G in fl4.12p . we can decompose it into several connected com- 
ponents. Just as in [4], these connected components can be classified into two 
types of sub-graphs. 

• Type-I subgraph. 

Definition 4.1 If a connected component C contains only one vertex, i.e., 
it only contains self-linked loops, then we call C as a Type-I subgraph. We 
define 

T\ = the set of all Type-I subgraphs in G. 
mi = \J^i\ = the number of Type-I subgraphs in G. 
/ii, . . . , /imi = the degrees of vertices of these mi Type-I subgraphs. 
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If Hj = 2 for some subgraphs of Type-I, then W^ipG = 0, hence it will not have 
any contribution to E.^^. On the other side, if /ij > 4 for all j = 1, . . . , mi, 
then 



ceTi 



c 



(4.13) 



where M is a sufficiently large constant. 
Type-II subgraph. 

Definition 4.2 // a connected component Cg contains at least one arrow 
■u — > f then we call it a Type-II subgraph. We define 

J^2 = the set of all Type-II subgraphs in G. 

m2 = 1-^2 1 = the number of Type-II subgraphs in G. 

Us = number of vertices for each subgraph C^ G J-2, s = 1, 2, . . . , m2 

7is, . . . , 'juss = the degrees of these Ug vertices in Gg- 

If 7js = 1 for some j and some s, then we also have E-i/'c = 0, giving no 
contribution to the overall expectation E^^. On the other side, if 7js > 2 
for all j, s, then we have 



E JJ ^, 

Cs£T2 



Cs 



<M.(6^iVi/^)S^-'iS;-(^- 



-2) 



(4.14) 



Now define Q to be the set of graphs such that 



g = g{mi, {nj}f=i, 1712, {us}T=i, {7i4i<i<«a,i<s<m2) 

= {G : G has rui Type-I sub-graphs, with degree fj,j of each vertex. 

Also G has m2 Type-II subgraphs, with Ug vertices in each Type-II sub-graph. 
Their degrees are defined by {^jg}i<j<u„i<s<m2}- 

As the first observation, the number of all possibilities of different ^'s is a bounded 
constant, independent of A^. From our previous analysis, for any G G ^ we must 
have 



|E^, 



'G 



= M- (e7vA^^/^)2'=-^™i-2^-i"\ 



(4.15) 



Here we used the equality Yl'^i t^i + Y17=i YTjLi Ijs = S/c. 

Next, we estimate the total number of graphs in Q, that is, we estimate \Q\. 
To get G G ^, we need to pick mi vertices to form the Type-I subgraphs, having 
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0{N"^^) possibilities. Also we need to pick m2 vertices to form Type-II subgraphs, 
having 0(A^™2) possibilities. Hence \g\ = 0{N'^^+"'^). 
Thus 



^-fe/2 



Geg 






.^ -.v -^ - . (4.16) 

In order to have a non-negligible contribution on E^^, we need ^^i(ms — 2)/2 < 0. 
However, we know Us > 2. Hence one must have Ug = 2 for all s. In this case we 
need 2k — Arrii — 2 Yl^=i Ug < to guarantee a non-negligible contribution. From 

7js > 2 and /Uj > 4 we obtain 

mi m2 Us m2 

2fc = ^ /ii + ^ ^ Ijs > 4mi + 2'^Us. 

i=l s=l j=l s=l 

Thus, in order for (14.1 6p to be non- negligible, we necessarily need u^ = 2, yUj = 4 
for all 1 < i < rrii and 7js = 2 for all 1 < j < Us, 1 < s < m2. 

In summary, just as in [4:\, we proved that the non-negligible graphs will only 
consist of the following three connected components. 

r 1 2 

• ki double loops u -^ u with terms E 'Ylif.=i ceCLuu {i){xeuyeu — p(^)) • 



^K 



.W^^w 



^2 simple cycles m — )■ f , f — )■ -u with terms E X^^i Q^t™ ' {^)xe.uyiv X]^=i c^^™ "* {^)xtvyiu 

2 



fcq double arrows u —> v,u -^ v with terms E 



^K 



W(fJ\-7. 






We must have 4(fci + /c2 + A^s) = 2k, or k = 2{ki + k2 + k^) which must be an even 
number. Let k = 2p for p G N"*". Similar to that in [4J we have 

(2p)! 



- T. 

Np /-^ 



^^A^ ATP, .^ /tj/tal/tg! 



Here 



fcl+fc2+fc3=P 



K 



E ^^ E 



■{^^'ijii 



Co 



I- (2) (2)ife2 
L J 'J Jj^l 



E c'^ 



■|„(3) „(3)-,fc3 



fo(l). 
(4.17) 



C, = nEE^^«laLa)WC 






k2 

C2 = J]e E^^«l(2)„(2)^^^^fa^.^'^^.(.^' 



ft: 



fcg r K 

C3=n^ E^^« 



(iV) 

J 3 



^pj^-'yp.w -p(^)) 

K 



,(1) 



j J 



(N) 



)Xp,(3)y.(3) 



(3) (3) V^;-^/,/3jy«,A' 



^.^^}c{l,...,Ar}. 

{uf\vf}c{l,...,N}. 

{uf\vf}c{l,...,N}. 
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Hence we have 



Ef^^ = (2p-l)!! Yl 



^ D'l'D^W^' + o(l) 



h\k2\h\ 



ki+k2+k3=p 

(2j9-l)!!(Di + D2 + /^3r + o(l). 



Here 



K K 



N 



Di = ^^QQ,Exam-pW X.m'i-Pin ■T7E«-(^)«-(^' 



e=i i'=i 

K K 



u=l 



^ ^ QQ/ ExtiXeiyeAVeji - p(^)p(^') ^w + o{l). 



1=1 £' = 1 
K K 



D2 = ^^QQ,E 

e=i e=i 

K K 

e=i £'=1 

K K 

D3 = J]^QQ.E 

e=i £'=i 

K K 

= YY QQ'E 



=1 £'=1 



xeiypi 



XiiVei 



xeiXi'i 



XiiXiii 



E 



E 



E 



E 



X£'iya 



X£'iyii 



ynVei 



ynyeji 



{6u' -uj£e') + 0(1). 
(tw -ujw) + 0(1). 



This completes the proof of the theorem. 
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As a corollary, Theorem 14. II can be applied to the situation where ^^^(f), Xj, yi 
are all real matrices or vectors. In this case, we have Ow = tu'. The following 
corollary holds true. 



Theorem 4.2 Under the setting of Theorem \4.1\ if in addition v4jv(£) G R^^^ 
and Xi, y-i G M'^, then if the following limit exist 



1 ^ 



(4.18) 
(4.19) 



u,v=l 
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Our vector Z^ will converge weakly to Gaussian distributed random vector W G M.^ 
with the covariance matrix B defined by D = Di + D2 where 



D2 



1 V 

ExnXi'iyeiye'i - p(^)p(^') w«' 1 , (4.20) 

E[xeiyi'i]E[xe>iyii] + E[xnXe'i]E[ynye'i]{Ou> - ^w) 1 • (4.21) 



As a second simple corollary, we have the following result of convergence in 
probability. 



Corollary 4.1 In the setting of Theorem \4.1\ for any < k < 1/2 we have 



N-^ 



X{iyANY{£)-p{i)ti{ANii)) ^0 (4.22) 



in probability. The similar result also holds for real cases. 

5 Conclusion 

In this paper, we studied the spiked population model to establish the asymptotic 
behavior of the sample eigenvalues and sample eigenvectors. The result is universal 
as we did not impose any strong assumptions on the original distribution of the 
sample points Xj. We showed that the joint distributions of the sample eigenval- 
ues will be jointly normal, with the covariance matrix explicitly calculated. Also 
we showed that the entries of the sample eigenvectors will also be jointly normal. 
Finally, the angle between the sample eigenvector and the true eigenvector will 
converge to a non-trivial constant, with central-limit-theorem-style local fluctua- 
tion. All the covariance matrices for the limiting Gaussian distributions have been 
explicitly calculated. We showed that they only depend on the first four moments 
of the distribution of the sample points Xj. 

Under the special case where the sample points Xj are Gaussian distributed, we 
showed as a corollary that the sample eigenvalues in different packs are asymptoti- 
cally independent. Moreover the local fluctuation of the eigenvector is independent 
of the corresponding eigenvalue as well. 
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